-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic metric for Containers on Windows #1
Conversation
if err := nc.osSetenv(goPSUtilProcDirEnv, hostProc); err != nil { | ||
return nil, fmt.Errorf("NodeCapacity cannot set goPSUtilProcDirEnv to %s: %w", hostProc, err) | ||
// On Windows, hostProc is empty | ||
if hostProc != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: While the current approach effectively handles the condition, a way to potentially improve readability could be to introduce a variable like isLinux
.
For example:
var isLinux = hostProc != ""
if isLinux {
// Existing logic...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be better to have _win/_lin specific files for some of these implementation ones. gopsutil does this
https://github.com/shirou/gopsutil/blob/master/process/process_windows.go
Another simple way but be to have the method call windows or linux sub-implementations within the same file. Having them split out is nice because then you are not required to have the win/lin impls cross-platform compile
package host | ||
|
||
// These variables are invalid for Windows | ||
const ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nit: This works, but could there be other more readable (and maintainable) ways to refactor this and enable the ability of having OS-specific behaviour or properties? Not sure how the codebase is what might work best so would rely on your good judgement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw similar pattern of defining OS specific const in repo.
@@ -0,0 +1,106 @@ | |||
package k8swindows | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we ignore building this on Linux?
//go:build windows
// +build windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, addressed!
return result | ||
} | ||
metrics = k.decorateMetrics(metrics) | ||
for _, cadvisorMetric := range metrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can the metric name in iteration be something like k8s_summary_metric
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
tags[ci.AutoScalingGroupNameKey] = c.hostInfo.GetAutoScalingGroupName() | ||
|
||
// add tags for EKS | ||
tags[ci.ClusterNameKey] = c.hostInfo.GetClusterName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Seems like you covered the critical tags. Additional tags to consider if not already added. Might edit and add more as I think of more ideas
- platform e.g. Windows or Linux
- availability zone and region
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
information about region is already part metrics internally, information about OS will be included in future PRs when refining metric labels
hostIP := os.Getenv("HOST_IP") | ||
kclient, err := kubeletutil.NewKubeletClient(hostIP, ci.KubeSecurePort, logger) | ||
if err != nil { | ||
return nil, errors.New("failed to initialize kubelet client") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve error handling and make debugging easier, can we please include the original error details so that a user will know what happened and how to rectify it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
} | ||
|
||
func new(logger *zap.Logger, info host.Info) (*kubeletSummaryProvider, error) { | ||
hostIP := os.Getenv("HOST_IP") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to check if HOST_IP
was set correctly or is this checked later in the code?
Maybe we can add
if hostIP == "" {
// return error or print warning if value is optional
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before adding this validator, I want to verify a usecase of what happens when HOST_IP
is not set. Based on results, i can add this additional check. if it fails, then i will add validation but it works then this is not required.
Does the project contain any unit/integration tests? Will these be added later? |
|
||
nodeCPUCores := k.hostInfo.GetNumCores() | ||
for _, pod := range summary.Pods { | ||
k.logger.Info(fmt.Sprintf("pod summary %v", pod.PodRef.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is there any way we can also show how many containers are running in the pod? If yes, and you agree it might be useful, lets add it. If not then nevermind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information will be collected in separate metrics field but not exactly the number of containers rather cpu, memory metrics for each container in pod.
@@ -197,7 +209,9 @@ func (acir *awsContainerInsightReceiver) Shutdown(context.Context) error { | |||
// collectData collects container stats from Amazon ECS Task Metadata Endpoint | |||
func (acir *awsContainerInsightReceiver) collectData(ctx context.Context) error { | |||
var mds []pmetric.Metrics | |||
if acir.cadvisor == nil && acir.k8sapiserver == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Similar to earlier feedback, maybe something like this might be more readable?
isLinux := acir.k8swindows != nil
isWindows := acir.cadvisor == nil && acir.k8sapiserver == nil && acir.k8swindows == nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address this in future PR since this complex logic might not be required.
Yes, unit tests will be part of this repo but integ are part of different. |
you should branch off of aws-cwa-dev in the contributing repo and merge this to a new branch in that repo instead of merging it into your fork. This will be helpful for having multiple developers and you can take advantage of the workflows and Amazon's github license or whatever for running those workflows |
General comment, your implementation for windows CI is really an alternate implementation to the cAdvisor code. cAdvisor as a name is probably over-used, we could rename to kubelet or kubeletStats or kubeletProvider or nodeStats or whatever but I would avoid creating a whole new "windows" thing when you essentially just need to create an alternate implementation of the existing cAdvisor "provider" This will reduce the code sprawl and minimize the number of windows/linux checks you have to make |
@@ -33,7 +33,7 @@ func (c *CPUMetricExtractor) GetValue(info *cInfo.ContainerInfo, mInfo CPUMemInf | |||
|
|||
// When there is more than one stats point, always use the last one | |||
curStats := GetStats(info) | |||
metric := newCadvisorMetric(containerType, c.logger) | |||
metric := NewCadvisorMetric(containerType, c.logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intereseting! I wonder if we need to make the concept of a "cAdvisor metric" more generic, basically it is a container insights metric or container insights system metric. Then that could sit in internal
and be shared by both the cAdvisor and windows impl.
For now though I think it is fine to just do this, that is a decent refactor for not a ton of value other than better semantics
@@ -44,7 +44,8 @@ type CAdvisorMetric struct { | |||
logger *zap.Logger | |||
} | |||
|
|||
func newCadvisorMetric(mType string, logger *zap.Logger) *CAdvisorMetric { | |||
// NewCadvisorMetric is public because used in k8swindows package, outside cadvisor package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - I would delete this comment, this is useful in a diff but I don't think it is useful for this comment to live on forever. Packages/methods don't need to detail why they are public or private or whatever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed, removed the comment!
} | ||
if err := nc.osSetenv(goPSUtilProcDirEnv, hostProc); err != nil { | ||
return nil, fmt.Errorf("NodeCapacity cannot set goPSUtilProcDirEnv to %s: %w", hostProc, err) | ||
// On Windows, hostProc is empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - I don't love this comment. The check is for hostProc, not windows/linux. The code should be evident that it needs a hostProc, all the reasons why it may not have a valid hostproc (i.e. windows) is not relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
if err := nc.osSetenv(goPSUtilProcDirEnv, hostProc); err != nil { | ||
return nil, fmt.Errorf("NodeCapacity cannot set goPSUtilProcDirEnv to %s: %w", hostProc, err) | ||
// On Windows, hostProc is empty | ||
if hostProc != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be better to have _win/_lin specific files for some of these implementation ones. gopsutil does this
https://github.com/shirou/gopsutil/blob/master/process/process_windows.go
Another simple way but be to have the method call windows or linux sub-implementations within the same file. Having them split out is nice because then you are not required to have the win/lin impls cross-platform compile
@@ -9,6 +9,7 @@ import ( | |||
|
|||
"go.uber.org/zap" | |||
corev1 "k8s.io/api/core/v1" | |||
stats "k8s.io/kubelet/pkg/apis/stats/v1alpha1" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any idea on the govuln check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it something about being alpha?
if err != nil { | ||
return err | ||
} | ||
// On Windows, K8sWindows provider is used which internally uses kubelet summary API for metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is an unnecessary comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
@@ -197,7 +209,9 @@ func (acir *awsContainerInsightReceiver) Shutdown(context.Context) error { | |||
// collectData collects container stats from Amazon ECS Task Metadata Endpoint | |||
func (acir *awsContainerInsightReceiver) collectData(ctx context.Context) error { | |||
var mds []pmetric.Metrics | |||
if acir.cadvisor == nil && acir.k8sapiserver == nil { | |||
// On linux, acir.k8swindows will be nil. | |||
// On Windows, both acir.cadvisor and acir.k8sapiserver will be nil. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a useful comment, it is basically documenting your assumption about the entry criteria
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed.
@@ -39,6 +41,7 @@ type awsContainerInsightReceiver struct { | |||
cancel context.CancelFunc | |||
cadvisor metricsProvider | |||
k8sapiserver metricsProvider | |||
k8swindows metricsProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my two cents - k8swindows and cadvisor are the same, no reason to not just populate cadvisor with two different impls, one impl for windows and one impl for linux (actually using cadvisor).
Then you don't need special code in the receiver for k8swindows.GetMetrics()
I am open to renaming it, kubeletProvider or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
I have removed k8swindows
from this struct. Renamed cadvisor
field to containerMetricProvider
and reused it for both cadvisor and k8sWindows implementation.
…emetry#24676) **Description:** The metadata.yml for the SSH check receiver currently documents a resource attribute containing the SSH endpoint but this is not emitted. This PR updates the receiver to include this resource attribute. **Link to tracking Issue:** open-telemetry#24441 **Testing:** Example collector config: ```yaml receivers: sshcheck: endpoint: 13.245.150.131:22 username: ec2-user key_file: /Users/dewald.dejager/.ssh/sandbox.pem collection_interval: 15s known_hosts: /Users/dewald.dejager/.ssh/known_hosts ignore_host_key: false resource_attributes: "ssh.endpoint": enabled: true exporters: logging: verbosity: detailed prometheus: endpoint: 0.0.0.0:8081 resource_to_telemetry_conversion: enabled: true service: pipelines: metrics: receivers: [sshcheck] exporters: [logging, prometheus] ``` The log output looks like this: ``` 2023-07-30T16:52:38.724+0200 info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "logging", "resource metrics": 1, "metrics": 2, "data points": 2} 2023-07-30T16:52:38.724+0200 info ResourceMetrics #0 Resource SchemaURL: Resource attributes: -> ssh.endpoint: Str(13.245.150.131:22) ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope otelcol/sshcheckreceiver 0.82.0-dev Metric #0 Descriptor: -> Name: sshcheck.duration -> Description: Measures the duration of SSH connection. -> Unit: ms -> DataType: Gauge NumberDataPoints #0 StartTimestamp: 2023-07-30 14:52:22.381672 +0000 UTC Timestamp: 2023-07-30 14:52:38.404003 +0000 UTC Value: 319 Metric #1 Descriptor: -> Name: sshcheck.status -> Description: 1 if the SSH client successfully connected, otherwise 0. -> Unit: 1 -> DataType: Sum -> IsMonotonic: false -> AggregationTemporality: Cumulative NumberDataPoints #0 StartTimestamp: 2023-07-30 14:52:22.381672 +0000 UTC Timestamp: 2023-07-30 14:52:38.404003 +0000 UTC Value: 1 ``` And the Prometheus metrics look like this: ``` # HELP sshcheck_duration Measures the duration of SSH connection. # TYPE sshcheck_duration gauge sshcheck_duration{ssh_endpoint="13.245.150.131:22"} 311 # HELP sshcheck_status 1 if the SSH client successfully connected, otherwise 0. # TYPE sshcheck_status gauge sshcheck_status{ssh_endpoint="13.245.150.131:22"} 1 ```
) **Description:** Adding command line argument `--status-code` to `telemetrygen traces`, which accepts `(Unset,Error,Ok)` (case sensitive) or the enum equivalent of `(0,1,2)`. Running ```shell telemetrygen traces --otlp-insecure --traces 1 --status-code 1 ``` against a minimal local collector yields ```txt 2023-07-29T21:27:57.862+0100 info ResourceSpans #0 Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0 Resource attributes: -> service.name: Str(telemetrygen) ScopeSpans #0 ScopeSpans SchemaURL: InstrumentationScope telemetrygen Span #0 Trace ID : f6dc4be32c78b9999c69d504a79e68c1 Parent ID : 4e2cd6e0e90cf2ea ID : 20835413e32d26a5 Name : okey-dokey Kind : Server Start time : 2023-07-29 20:27:57.861602 +0000 UTC End time : 2023-07-29 20:27:57.861726 +0000 UTC Status code : Error Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-client) Span #1 Trace ID : f6dc4be32c78b9999c69d504a79e68c1 Parent ID : ID : 4e2cd6e0e90cf2ea Name : lets-go Kind : Client Start time : 2023-07-29 20:27:57.861584 +0000 UTC End time : 2023-07-29 20:27:57.861726 +0000 UTC Status code : Error Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-server) ``` and similarly (the string version) ```shell telemetrygen traces --otlp-insecure --traces 1 --status-code '"Ok"' ``` produces ```txt Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0 Resource attributes: -> service.name: Str(telemetrygen) ScopeSpans #0 ScopeSpans SchemaURL: InstrumentationScope telemetrygen Span #0 Trace ID : dfd830da170acfe567b12f87685d7917 Parent ID : 8e15b390dc6a1ccc ID : 165c300130532072 Name : okey-dokey Kind : Server Start time : 2023-07-29 20:29:16.026965 +0000 UTC End time : 2023-07-29 20:29:16.027089 +0000 UTC Status code : Ok Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-client) Span #1 Trace ID : dfd830da170acfe567b12f87685d7917 Parent ID : ID : 8e15b390dc6a1ccc Name : lets-go Kind : Client Start time : 2023-07-29 20:29:16.026956 +0000 UTC End time : 2023-07-29 20:29:16.027089 +0000 UTC Status code : Ok Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-server) ``` The default is `Unset` which is the current behaviour. **Link to tracking Issue:** 24286 **Testing:** Added unit tests which covers both valid and invalid inputs. **Documentation:** Command line arguments are self documenting via the usage info in the flag. Co-authored-by: Pablo Baeyens <[email protected]>
This PR defines code structure for metric provider which works on Windows. ### Changelog 1. Changed receiver.go in awscontainerinsights to run for Windows with metric provider. 2. Added summary API in kubeletclient 3. Add kubeletProvider to return metrics at different levels i.e. pod, contianer, node. 4. Updated hostInfo providers to run for Windows. 5. Updated ebsVolume Info provider to run for Windows. ## Todos: 1. Define correct ebsVolume Info provider for Windows 2. Change logic around k8s leader election to run for Windows
b7a7513
to
d575b52
Compare
|
bd19de8
to
97d5c39
Compare
Description:
Add pod level metric collection for Windows
Link to tracking Issue:
Testing:
Tested locally and it worked
Documentation: